Skip to content

Conversation

@Pijukatel
Copy link
Contributor

@Pijukatel Pijukatel commented Nov 20, 2025

Description

  • Add ResourceCollectionClient._getIterablePagination which extends the return value of ResourceCollectionClient._list by asyncIterator that can be used to iterate over individual items. (It is made in a generic way and can be applied to various endpoints if desired.)

  • Apply _getIterablePagination to:

    • ActorCollectionClient.list
    • ActorEnvVarCollectionClient.list
    • ActorVersionCollectionClient.list
    • BuildCollectionClient.list
    • RunCollectionClient.list
    • DatasetCollectionClient.list
    • KeyValueStoreCollectionClient.list
    • RequestQueueCollectionClient.list
    • ScheduleCollectionClient.list
    • StoreCollectionClient.list
    • TaskCollectionClient.list
    • WebhookCollectionClient.list
    • WebhookDispatchCollectionClient.list
  • Add unit tests of async iteration for the single representative class DatasetCollectionClient.list (it works the same way on all the classes)

Example usage

It can still be used the same way as before:

actors = await apifyClient.store().list({ limit, offset });
// Paginated response with up to 1000 items with actor details
console.log(actors.items.length);

Or it can be used as asyncIterator that can return more individual items from more than one chunk based on the limit, and offset options and also based on the number of items returned from the API:

for await (const actor of apifyClient.store().list({ limit, offset })) {
    // Single actor details
    console.log(actor);
}

Issues

@github-actions github-actions bot added this to the 128th sprint - Tooling team milestone Nov 20, 2025
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Nov 20, 2025
@Pijukatel Pijukatel force-pushed the paginated-list-iterator branch from 604c018 to de2c43f Compare November 20, 2025 15:36
@Pijukatel Pijukatel changed the title feat: Add AsyncIterator to the apifyClient.list feat: Add AsyncIterator to the StoreCollectionClient.list return value Nov 20, 2025
@Pijukatel Pijukatel requested a review from B4nan November 20, 2025 15:42
@Pijukatel Pijukatel changed the title feat: Add AsyncIterator to the StoreCollectionClient.list return value feat: Add asyncIterator to the StoreCollectionClient.list return value Nov 20, 2025
export interface PaginationOptions {
/** Position of the first returned entry. */
offset?: number;
/** Maximum number of entries requested. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/** Maximum number of entries requested. */
/** Maximum number of entries requested for one chunk. */

not sure if page or chunk is better, but it should be clear this is a limit for the chunk and not a total limit for the async iterator

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the total limit for the whole iterator. Chunk size is limited by the length of the platform response; it is not limited by this code.

return {
...currentPage,
async *[Symbol.asyncIterator]() {
yield currentPage;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we are yielding the whole pages? i thought we would yield just the items, one by one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my idea was that the iterator should yield exactly the same type as the original response, so that you can use the same code to process it. We also already have the whole chunk in memory. For example:

actors = await apifyClient.store().list({ limit, offset });
processActors(actors)

for await (const actorsChunk of actors) {
    processActors(actorsChunk)
}

But I guess you would prefer?

actors = await apifyClient.store().list({ limit, offset });
processActors(actors)

for await (const singleActor of actors) {
    processSingleActor(singleActor)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return the full pages, it will make the usage harder (forcing users to use nested loops). I would yield items one by one, not pages.

Copy link
Contributor Author

@Pijukatel Pijukatel Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to allow iteration over individual items, while making API requests only for as big chunks as possible.
So for example when using limit 3000, it will make 3 requests (3x1000), but for await loop will run 3000x times.

@B4nan B4nan requested review from barjin and janbuchar November 21, 2025 10:15
@barjin
Copy link
Member

barjin commented Nov 21, 2025

I agree w/ both your remarks about the API - can we still brainstorm the expected interface?

I would love something like

for await (const actor of await client.store().list()) {
    log(actor.name); // logs all the Actor's names
}

and

for await (const actor of await client.store().list({ limit: 10 })) {
    log(actor.name); // logs only first 10 Actor's names
}

Alternatively, we could even attempt something like

- for await (const actor of await client.store().list()) {
+ for await (const actor of client.store().list()) {

if we can attach the async iterator to the promise (I suppose we can, but I'm not sure how happy TypeScript would be about this).


(Internally, all those solutions would still lazy-load some optimal-sized pages with the correct offsets.)

@B4nan
Copy link
Member

B4nan commented Nov 21, 2025

for await (const actor of client.store().list()) {

Yes, this is exactly what I would like to see, but I am not entirely sure if it's doable. The list method would have to be async generator itself, not an async function returning one, which would likely break the current API. We'd probably need a different method, which is feasible, but it would probably mean quite a lot of added code.

@barjin
Copy link
Member

barjin commented Nov 21, 2025

The list method would have to be an async generator itself, not an async function returning one

for await requires the iterable to be of type AsyncIterable, which you can return from a function. The function could actually return Promise & AsyncIterable like this (see list() implementation):

type IterablePromise<TItem> = Promise<Iterable<TItem>> & AsyncIterable<TItem>;

async function fetchData() {
    await new Promise(res => setTimeout(res, 1000));
    return ['data1', 'data2', 'data3'];
}

function list(): IterablePromise<string> {
    const itemsPromise = fetchData();

    async function* asyncGenerator() {
        const items = await itemsPromise;
        for (const item of items) {
            yield item;
        }
    }

    Object.defineProperty(itemsPromise, Symbol.asyncIterator, {
        value: asyncGenerator
    });

    return itemsPromise as any;
}

async function main() {
    // treat the return value as Promise<string[]>
    for (const item of await list()) {
        console.log(item);
    }

    // treat the return value as AsyncIterator<string>
    for await (const item of list()) {
        console.log(item);
    }
}

main();

The only issue is that TypeScript requires the return value of an async function to be a Promise<T> (ts(1064)), but you can make do with .then() callbacks etc just fine (see example above).

@Pijukatel
Copy link
Contributor Author

for await requires the iterable to be of type AsyncIterable, which you can return from a function. The function could actually return Promise & AsyncIterable like this (see list() implementation):

Updated accordingly.

@Pijukatel Pijukatel requested a review from B4nan November 24, 2025 15:00
Comment on lines 34 to 36
let itemsFetched = currentPage.items.length;
let currentLimit = options.limit !== undefined ? options.limit - itemsFetched : undefined;
let currentOffset = options.offset ?? 0 + itemsFetched;
Copy link
Member

@barjin barjin Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling this could be simplified using the pagination fields from the response.

How about e.g.

const isLastPage = Math.min(page.total, options.limit + options.offset) <= page.count + page.offset

This works because:

$n = \text{page.count} + \text{page.offset}$ tells that the current page includes the $n$-th list item. if $n = \text{page.total}$, we've seen the entire list.

The options.limit and options.offset options select an interval of the list between

$$(\text{options.offset}, \text{options.offset} + \text{options.limit}\rangle$$

If the current page includes the

$$n = \text{options.offset} + \text{options.limit}$$

$n$-th item, it's the last page containing any items from the selected interval.

Then we could IMO do

let page = await fetchPage({ limit: options.limit, offset: options.offset });
yield* page.items;

/// The variable names are rather for explanation, not production ready
const lastItemIndex = Math.min(page.total, options.limit + options.offset);
let lastItemIndexFromThePreviousPage = page.count + page.offset;

while (lastItemIndexFromThePreviousPage < lastItemIndex) {
    const remainingItemCount = lastItemIndex - lastPageItemIndex;
    page = await fetchPage({ limit: remainingItemCount, offset: lastItemIndexFromThePreviousPage });
    lastItemIndexFromThePreviousPage = page.count + page.offset;

    yield* page.items;
}

Having typed this out, I'm no longer convinced this is a better way... but feel free to get inspiration 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding pagination fields from the response. According to the documentation, they are not defined on all list endpoints. I think we can rely only on total and items. (Previously, I was under the impression that even total was optional, but re-checking the documentation, it seems it is always there.)

Here is an example of the minimal endpoint: https://docs.apify.com/api/v2/act-versions-get
(Tested and it does really return just items and total)

Therefore, I would define the algorithm in a way that does not require any optional fields from the response. But knowing I can get total allows at least some of the improvements you suggested.

I would also keep this in while condition currentPage.items.length > 0 condition, to handle any API problems or situations when the requested resources change during the iteration due to some external action. (Someone removing an actor could otherwise lead to an infinite loop)

const lastItemIndex = Math.min(page.total, options.limit + options.offset);
Since those fields are optional, this would not work for a user who defines just an offset.

Copy link
Member

@barjin barjin Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://api.apify.com/v2/acts/.../versions is not a paginated endpoint though; it just lists all the items under items and the length of this array under total (see implementation here). See that it won't react to offset nor limit query params.

All paginated endpoints will use the Pagination class, which means those will always accept all the parameters (and return all the fields).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it in a way that can be applied to all the classes that were internally calling ResourceCollectionClient._list, even the "fake paginated endpoints".

@Pijukatel Pijukatel requested a review from barjin November 25, 2025 11:48
@Pijukatel
Copy link
Contributor Author

Pijukatel commented Nov 26, 2025

Some manual integration tests on all the modified endpoints was working fine on my account and one selected actor:

...
const clients=[
    apifyClient.datasets(),
    apifyClient.requestQueues(),
    apifyClient.keyValueStores(),
    apifyClient.actors(),
    apifyClient.actor(someActorId).versions(),
    apifyClient.actor(someActorId).version('0.0').envVars(),
    apifyClient.actor(someActorId).builds(),
    apifyClient.actor(someActorId).runs(),
    apifyClient.schedules(),
    apifyClient.store(),
    apifyClient.tasks(),
    apifyClient.webhooks(),
    apifyClient.webhookDispatches()
]


for (const client of clients) {
    const items = [];
    for await (const item of client.list()) {
        items.push(item);
    }
    console.log(client.constructor.name)
    console.log(items.length);
}

@Pijukatel Pijukatel changed the title feat: Add asyncIterator to the StoreCollectionClient.list return value feat: Make all collectionClient.list method return also asyncIterator of relevant data Nov 26, 2025
@Pijukatel Pijukatel marked this pull request as ready for review November 26, 2025 13:02
@Pijukatel Pijukatel changed the title feat: Make all collectionClient.list method return also asyncIterator of relevant data feat: Make all collectionClient.list methods return also asyncIterator of relevant data Nov 26, 2025
@Pijukatel Pijukatel changed the title feat: Make all collectionClient.list methods return also asyncIterator of relevant data feat: Make all collectionClient.list methods also return asyncIterator of relevant data Nov 26, 2025
@Pijukatel Pijukatel changed the title feat: Make all collectionClient.list methods also return asyncIterator of relevant data feat: Make all collectionClient.list methods return value also be asyncIterator of relevant data Nov 26, 2025
Copy link
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I'd still like to groom the codebase a little, see my comments ⬆️

exclusiveStartId?: string;
}

export interface PaginationOptions {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if now all the XYZOptions interfaces could extend this interface to reduce the duplicate code

see e.g.

export interface WebhookDispatchCollectionListOptions {
limit?: number;
offset?: number;
desc?: boolean;
}

Copy link
Contributor Author

@Pijukatel Pijukatel Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these interfaces are wrong. For example:

export interface ActorEnvVarCollectionListOptions {
    limit?: number;
    offset?: number;
    desc?: boolean;
}

But the endpoint does not take any parameters (tested, and also the documentation does not mention parameters)

https://docs.apify.com/api/v2/act-version-env-vars-get

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied it to all the options and raised an issue with the platform people about the unused options
https://apify.slack.com/archives/C01VBUV81UZ/p1764238228809389

btw.
StoreCollectionListOptions does not have desc as the only option there :-(

list(
options: WebhookCollectionListOptions = {},
): Promise<PaginatedList<Omit<Webhook, 'payloadTemplate' | 'headersTemplate'>>> {
): Promise<PaginatedList<Omit<Webhook, 'payloadTemplate' | 'headersTemplate'>>> &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could have a generic utility type, sth like

type PaginatedListIterable<T> = Promise<PaginatedList<T>> & AsyncIterable<T>

to reduce the amount of code and simplify adding new endpoints?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I saw IterablePaginatedList<Data> from utils.ts, why not use that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work for the normal ones, but there are these weird ones that I am not sure how to fit into such new type.

Normal one:
Promise<PaginatedList<DatasetCollectionClientListResult>> & AsyncIterable<DatasetCollectionClientListResult> == PaginatedListIterable<DatasetCollectionClientListResult>

Weird ones:

  • Promise<Pick<PaginatedList<ActorEnvironmentVariable>, 'total' | 'items'>> & AsyncIterable<ActorEnvironmentVariable>
  • Promise<Pick<PaginatedList<FinalActorVersion>, 'total' | 'items'>> & AsyncIterable<FinalActorVersion>
  • Promise<PaginatedList<Omit<KeyValueStore, 'stats'> & { username?: string }>> & AsyncIterable<KeyValueStore>
  • Promise<PaginatedList<RequestQueue & { username?: string }> & { unnamed: boolean;}> & AsyncIterable<RequestQueue>

Copy link
Contributor Author

@Pijukatel Pijukatel Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied it to the normal ones, but the special ones will stay special.

I have no strong opinion on the name.

Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few questions and ideas, otherwise it looks great!

async function* asyncGenerator() {
let currentPage = await paginatedListPromise;
yield* currentPage.items;
const offset = options.offset || 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const offset = options.offset || 0;
const offset = options.offset ?? 0;

(not that it would matter much in here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

async list(options: ActorCollectionListOptions = {}): Promise<ActorCollectionListResult> {
list(
options: ActorCollectionListOptions = {},
): Promise<PaginatedList<ActorCollectionListItem>> & AsyncIterable<ActorCollectionListItem> {

This comment was marked as duplicate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this thread with #790 (comment)

Copy link
Contributor

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it!

Copy link
Member

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, thank you!

I noticed a few discrepancies in the (existing) types during the review, I made a separate issue (let's merge this).

@Pijukatel Pijukatel merged commit f855fd4 into master Nov 27, 2025
7 checks passed
@Pijukatel Pijukatel deleted the paginated-list-iterator branch November 27, 2025 14:18
@jancurn
Copy link
Member

jancurn commented Nov 27, 2025

Great stuff! How about we add some example to the main docs (not just function/class comments)?

@Pijukatel
Copy link
Contributor Author

Great stuff! How about we add some example to the main docs (not just function/class comments)?

Yes, I think this example could be updated: https://docs.apify.com/api/client/js/docs#pagination

But first, I have to explore a little how this change should be applied to list methods of StorageClients, which are a little different compared to the CollectionClients affected by this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add better support for pagination

6 participants